Efficient Work Stealing for Portability of Nested Parallelism and Composability of Multithreaded Program

نویسنده

  • Zahir Zainuddin
چکیده

We present performance evaluations of parallel-for loop with work stealing technique. The parallel-for by work stealing transforms the parallel-loop into a form of binary tree by making use of method of divide-and-conquer. Iterations are distributed in the leaves procedures of the binary tree, and the parallel executions are performed by stealing subtrees from the bottom of the tree. The work stealing and divide-and-conquer are used to address the portability problem in nested parallelism and composability. By work stealing and divide-and-conquer techniques, fine-grained parallel-for can be implemented without contributing large work overhead. Low work overhead is important as the number of processor could be less than expected. Low overhead and fine-grained of work stealing scheduler makes highly parallel processor cores are able to scale the performance. In addition, the approach used in this work makes efficient nested parallelism is possible. Because of a low overhead, we show that the work stealing and divide-and-conquer deliver good scalability in nested parallel Sparse LU factorization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Executing multithreaded programs efficiently

This thesis presents the theory, design, and implementation of Cilk (pronounced “silk”) and Cilk-NOW. Cilk is a C-based language and portable runtime system for programming and executing multithreaded parallel programs. Cilk-NOW is an implementation of the Cilk runtime system that transparently manages resources for parallel programs running on a network of workstations. Cilk is built around a ...

متن کامل

History-Based Adaptive Work Distribution

Exploiting parallelism of increasingly heterogeneous parallel architectures is challenging due to the complexity of parallelism management. To achieve high performance portability whilst preserving high productivity, high-level approaches to parallel programming delegate parallelism management, such as partitioning and work distribution, to the compiler and the run-time system. Random work stea...

متن کامل

Auto-tuned nested parallelism: A way to reduce the execution time of scientific software in NUMA systems

Scientific and engineering problems are solved with large parallel systems In some cases those systems are NUMA A large number of cores Share a hierarchically organized memory Kernel of the computation for those problems: BLAS o similar Efficient use of kernels a faster solution of a large range of scientific problems Auto Auto-tuned nested parallelism: a way to reduce the execution time of sci...

متن کامل

Portable high-performance programs

This dissertation discusses how to write computer programs that attain both high performance and portability, despite the fact that current computer systems have different degrees of parallelism, deep memory hierarchies, and diverse processor architectures. To cope with parallelism portably in high-performance programs, we present the Cilk multithreaded programming system. In the Cilk-5 system,...

متن کامل

Truly Nested Data-Parallelism: Compiling SaC for the MicroGrid Architecture

Data-parallel programming facilitates elegant specification of concurrency. However, the composability of data-parallel operations so far has been constrained by the requirement to have only flat dataparallel operation at runtime. In this paper, we present early results on our work to exploit hardware support for nested concurrency to directly map nested data-parallel operations in high-level s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013